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C244.00/N 



Title : 



Improvements in or Relating to Starch Content of Plants 



Field of the Invention 

This invention relates to novel nucleic acid sequences, vectors and host cells comprising 
the nucleic acid sequence(s), to polypeptides encoded thereby, and to a method of altering 
a host cell by introducing the nucleic acid sequence (s) of the invention. 

Background to the Invention 

Starch consists of two main polysaccharides, amylose and amylopectin. Amylose is a 
linear polymer containing a-1,4 linked glucose units, while amylopectin is a highly 
branched polymer consisting of a a-1,4 linked glucan backbone with a-1,6 linked glucan 
branches. In most plant storage reserves amylopectin consitutes about 75% of the starch 
content. Amylopectin is synthesized by the concerted action of soluble starch synthase and 
starch branching enzyme [a-1,4 glucan: a-1,4 glucan 6-glycosyltransferase, EC 2.4.1.18]. 
Starch branching enzyme (SBE) hydrolyses a-1,4 linkages and rejoins the cleaved glucan, 
via an a-1,6 linkage, to an acceptor chain to produce a branched structure. The physical 
properties of starch are strongly affected by the relative abundance of amylose and 
amylopectin, and SBE is therefore a crucial enzyme in determining both the quantity and 
quality of starches produced in plant systems. 

Starches are commercially available from several plant sources including maize, potato and 
cassava. Each of these starches has unique physical characteristics and properties and a 
variety of possible industrial uses. In maize there are a number of naturally occurring 
mutants which have altered starch composition such as high amylopectin types ("waxy" 
starches) or high amylose starches but in potato and cassava no such mutants exist on a 
commercial basis as yet. 



Genetic modification offers the possibility of obtaining new starches which may have novel 
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and potentially useful characteristics. Most of the work to date has involved potato plants 
because they are amenable to genetic manipulation i.e. they can be transformed using 
Agrobacterium and regenerated easily from tissue culture. In addition many of the genes 
involved in starch biosynthesis have been cloned from potato and thus are available as 
targets for genetic manipulation, for example, by antisense inhibition of expression or 
sense suppression. 

Cassava (Manihot esculenta L. Crantz) is an important crop in the tropics, where its 
starch- filled roots are used both as a food source and increasingly as a source of starch. 
Cassava is a high yielding perennial crop that can grow on poor soils and is also tolerant 
of drought. Cassava starch being a root-derived starch has properties similar but not 
identical to potato starch and is composed of 20-25% amy lose and 75-80% amylopectin 
(Rickard et aL, 1991. Trop. Sci. 31, 189-207). Some of the genes involved in starch 
biosynthesis have been cloned from cassava, including starch branching en2yme I (SBE 
I) (Salehuzzaman et aL , 1994 Plant Science 98, 53-62), and granule bound starch synthase 
I (GBSS I) (Salehuzzaman et aL, 1993 Plant Molecular Biology 23, 947-962) and some 
work has been done on their expression patterns although only in in vitro grown plants 
(Salehuzzaman et aL, 1994 Plant Science 98, 53-62). 

In most plants studied to date e.g. maize (Boyer & Preiss, 1978 Biochem. Biophys. Res. 
Comm. 80, 169-175), rice (Smyth, 1988 Plant Sci. 57, 1-8) and pea (Smith, Planta 775, 
270-279), two forms of SBE have been identified, each encoded by a separate gene. A 
recent review by Burton et aL, (1995 The Plant Journal 7, 3-15) has demonstrated that the 
two forms of SBE constitute distinct classes of the enzyme such that, in general, enzymes 
of the same class from different plants may exhibit greater similarity than enzymes of 
different classes from the same plant. In their review, Burton et aL termed the two 
respective enzyme families class "A" and class "B\ and the reader is referred thereto (and 
to the references cited therein) for a detailed discussion of the distinctions between the two 
classes. One general distinction of note would appear to be the presence, in class A SBE 
molecules, of a flexible N-terminal domain, which is not found in class B molecules. The 
distinctions noted by Burton et aL are relied on herein to define class A and class B SBE 
molecules, which terms are to be interpreted accordingly. 
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Many organisations have interests in obtaining modified Cassava starches by means of 
genetic modification. This is impossible to achieve however, unless the plant is amenable 
to transformation and regeneration, and the starch biosynthesis genes which are to be 
targeted for modification must be cloned. The production of transgenic cassava plants has 
only recently been demonstrated (Taylor et aL, 1996 Nature Biotechnology 14, 726-730; 
Schopke et aL, 1996 Nature Biotechnology 14, 731-735; and Li et aL, 1996 Nature 
Biotechnology 14, 736-740). The present invention concerns the identification, cloning 
and sequencing of a starch biosynthetic gene from Cassava, suitable as a target for genetic 
manipulation. 

Summary of the Invention 

In a first aspect the invention provides a nucleic acid sequence encoding a polypeptide 
having starch branching enzyme (SBE) activity, the polypeptide comprising an effective 
portion of amino acid residues 1-836 of the sequence shown in Figure 4. The nucleic acid 
is conveniently in substantial isolation, especially in isolation from other naturally 
associated nucleic acid sequences. 

An "effective portion" of amino acid residues 1-836 may be defined as a portion which 
retains sufficient SBE activity when expressed in E. coli KV832 to complement the 
branching enzyme mutation therein. The amino acid sequence shown in Figure 4 includes 
the N terminal transit peptide, which comprises about the first 50 amino acid residues. 
As those skilled in the art will be well aware, such a transit peptide is not essential for 
SBE activity. Thus the mature polypeptide, lacking a transit peptide, may be considered 
as one example of an effective portion of residues 1-836. 

Other effective portions may be obtained by effecting minor deletions in the amino acid 
sequence, whilst substantially preserving SBE activity. Comparison with known class A 
SBE sequences, with the benefit of the disclosure herein, will enable those skilled in the 
art to identify regions of the polypeptide which are less well conserved and so amenable 
to minor deletion, or amino acid substitution (particularly, conservative amino acid 
substitution) whilst substantially preserving SBE activity. Such less well-conserved 
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regions are generally found in the N terminal 179 amino acid residues (up to the triple 
proline "elbow" at residues 180-183) and in the last 50 residues or so of the C terminal, 
and in particular in the acidic tail of the C terminal. 

Conveniently the nucleic acid sequence is obtainable from cassava, preferably obtained 
therefrom, and typically encodes a polypeptide obtainable from cassava. In a particular 
embodiment, the encoded polypeptide may have the amino acid sequence NSKH at about 
position 697, which sequence appears peculiar to an isoform of the SBE class A enzyme 
of cassava, other class A SBE enzymes having the conserved sequence DA D/E Y (Burton 
et aL, 1995 cited above). 

In a particular embodiment the nucleic acid comprises a portion of nucleotides 21 to 2531 
of the nucleic acid sequence shown in Figure 4, or a functionally equivalent nucleic acid 
sequence. Such functionally equivalent nucleic acid sequences include sequences which 
encode substantially the same polypeptide, but which differ in nucleotide sequence from 
that shown in Figure 4 by virtue of the degeneracy of the genetic code. For example, a 
nucleic acid sequence may be altered (e.g. "codon optimised") for expression in a host 
other than cassava, such that the nucleotide sequence differs substantially whilst the amino 
acid sequence of the encoded polypeptide is unchanged. Other functionally equivalent 
nucleic acid sequences are those which will hybridise under stringent hybridisation 
conditions (e.g. as described by Sambrook et aL 9 Molecular Cloning. A Laboratory 
Manual, CSH, i.e. washing with O.lxSSC, 0.5% SDS at 68°C) with the sequence shown 
in Figure 4. Figure 10 shows a functionally equivalent sequence designated "125 4- 94", 
which includes a region corresponding to the 3' coding portion of the sequence in Figure 
4. 

Functionally equivalent DNA sequences will preferably comprise at least 200-300bp, more 
preferably 300-600bp, and will exhibit at least 90% identity (preferably at least 95% 
identity) with the corresponding region of the DNA sequence shown in figures 4 or 10. 
Those skilled in the art wiil readily be able to conduct a sequence alignment between the 
putative functionally equivalent sequence and those detailed in Figures 4 or 10 - the 
identity of the two sequences is to be compared in those regions which are aligned by 
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standard computer software, which aligns corresponding regions of the sequences. 

In particular embodiments the nucleic acid sequence may alternatively comprise a 5' 
and/or a 3' untranslated region ("UTR"), examples of which are shown in Figures 2 and 
4. Figure 9 includes a 3' UTR, as nucleotides 688-1044 and Figure 10 includes 3' UTR 
as nucleotides 1507-1900 (which nucleotides correspond to the first base after the "stop" 
codon to the base immediately preceding the poly (A) tail). Any one of the sequences 
defined above, or a functional equivalent thereof (as defined by hybridisation properties, 
as set out in the preceding paragraph), could be useful in sense or anti-sense inhibition of 
corresponding genes, as will be apparent to those skilled in the art. It will also be 
apparent to those skilled in the art that such regions may be modified so as to optimise 
expression in a particular type of host cell and that the 5' and/or 3' UTRs could be used 
in isolation, or in combination with a coding portion of the sequence of the invention. 
Similarly, a coding portion could be used without a 5' or a 3' UTR if desired. 

In a further aspect, the invention provides a replicable nucleic acid construct comprising 
any one of the nucleic acid sequences defined above. The construct will typically 
comprise a selectable marker and may allow for expression of the nucleic acid sequence 
of the invention. Conveniently the vector will comprise a promoter (especially a promoter 
sequence operable in a plant and/or a promoter operable in a bacterial cell) and one or 
more regulatory signals known to those skilled in the art. 

In another aspect the invention provides a polypeptide having SBE activity, the polypeptide 
comprising an effective portion of amino acid residues 1-863 of the amino acid sequence 
shown in Figure 4. The polypeptide is conveniently one obtainable from cassava, although 
it. may be derived using recombinant DNA techniques. The polypeptide is preferably in 
substantial isolation from other polypeptides, especially in isolation from polypeptides of 
plant origin. The polypeptide may have amino acid residues NSKH at about position 697, 
instead of the sequence DA D/E Y found in other SBE class A polypeptides. The 
polypeptide may be used in a method of modifying starch in vitro, the method comprising 
treating starch under suitable conditions (of temperature, pH etc.) with an effective amount 
of the polypeptide. 
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Those skilled in the art will appreciate that the disclosure of the present specification can 
be utilised in a number of ways. In particular, the characteristics of a host cell may be 
altered by recombinant DNA techniques. Thus, in a further aspect, there is provided a 
method by which a host cell may be altered by introduction of a nucleic acid sequence 
comprising at least 200bp and exhibiting at least 90% sequence identity with the 
corresponding region of the DNA sequence shown in Figures 4, 9 or 10, operably linked 
in the sense or (preferably) in the anti-sense orientation to a suitable promoter active in 
the host cell, and causing transcription of the introduced nucleic acid sequence, said 
transcript and/or the translation product thereof being sufficient to interfere with the 
expression of a homologous gene naturally present in said host cell, which homologous 
gene encodes a polypeptide having SBE activity. The altered host cell is typically a plant 
cell, such as a cell of a cassava, banana, potato, sweet potato, tomato, pea, wheat, barley, 
oat, maize, or rice plant. 

Desirably the method further comprises the introduction of one or more nucleic acid 
sequences which are effective in interfering with the expression of other homologous gene 
or genes naturally present in the host cell. Such other genes whose expression is inhibited 
may be involved in starch biosynthesis (e.g. an SBE I gene), or may be unrelated to SBE 
II. 

Those skilled in the art will be aware that both anti-sense inhibition, and "sense 
suppression" of expression of genes, especially plant genes, has been demonstrated (e.g. 
Matzke & Matzke 1995 Plant Physiol. 107, 679-685). 

It is believed that antisense methods are mainly operable by the production of antisense 
mRNA which hybridises to the sense mRNA, preventing its translation into functional 
polypeptide, possibly by causing the hybrid RNA to be degraded (e.g. Sheehy et al, 1988 
PNAS 55, 8805-8809; Van der Krol et al., Mol. Gen. Genet. 220, 204-212). Sense 

SUnnreSSTOn also rprmirpc hnmnlrvcrv Kotiuoon tho ^fm^morl «-u^ ♦ ♦ ~ 

*■ A 1 " wwvr.w^i* "iW UIUVUUWWU JWkjUWUW^ CL11U U1U UU^Cl 

but the exact mechanism is unclear. It is apparent however that, in relation to both 
antisense and sense suppression, neither a full length nucleotide sequence, nor a "native" 
sequence is essential. Preferably the nucleic acid sequence used in the method will 
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comprise at least 200-300bp, more preferably at least 300-600bp, of the full length 
sequence, but by simple trial and error other fragments (smaller or larger) may be found 
which are functional in altering the characteristics of the plant. It is also known that 
untranslated portions of sequence can suffice to inhibit expression of the homologous gene 
- coding portions may be present within the introduced sequence, but they do not appear 
to be essential under all circumstances. 

The inventors have discovered that there are at least two class A SBE genes in cassava. 
A fragment of a second gene has been isolated, which fragment directs the expression of 
the C terminal 481 amino acids of cassava class A SBE (see Figure 10) and comprises a 
3' untranslated region. The coding portions of the two genes show some differences, and 
that portion of the fragmentary SBE gene may be considered as functionally equivalent to 
the corresponding portion of the nucleotide sequence shown in Figure 4. However, the 
3* untranslated regions of the two genes show marked differences. Thus the method of 
altering a host cell may comprise the use of a sufficient portion of either gene so as to 
inhibit the expression of the naturally occurring homologous gene. Conveniently, a 
portion of nucleotide sequence is employed which is conserved between both genes. 
Alternatively, sufficient portions of both genes may be employed, typically using a single 
construct to direct the transcription of both introduced sequences. 

In addition, as explained above, it may be desired to cause inhibition of expression of the 
class B SBE (i.e. SBE I) in the same host cell. A number of class B SBE gene sequences 
are known, including portions of the cassava class B SBE (Salehuzzaman et al. 9 1994 
Plant Science 98, 53-62) and any one of these may prove suitable. Preferably the 
sequence used is that which derives from the host cell sought to be altered (e.g. when 
altering the characteristics of a cassava plant cell, it is generally preferred to use sense or 
anti-sense sequences corresponding exactly to at least portions of the cassava gene whose 
expression is sought to be inhibited). 

In a further aspect the invention provides an altered host cell, into which has been 
introduced a nucleic acid sequence comprising at least 200bp and exhibiting at least 90% 
sequence identity with the corresponding region of the DNA sequence shown in Figures 



8 



4, 9 or 10, operably linked in the sense or anti-sense orientation to a suitable promoter, 
said host cell comprising a natural gene sharing sequence homology with the introduced 
sequence. 

The host cell may be a micro-organism (such as a bacterial, fungal or yeast cell) or a plant 
cell. Conveniently the host cell altered by the method is a cell of a cassava plant, or 
another plant with starch storage reserves, such as banana, potato, sweet potato, tomato, 
pea, wheat, barley, oat, maize, or rice plant. Typically the sequence will be introduced 
in a nucleic acid construct, by way of transformation, transduction, micro-injection or 
other method known to those skilled in the art. The invention also provides for a plant 
into which has been introduced a nucleic acid sequence of the invention, or the progeny 
of such a plant. 

The altered plant cell will preferably be grown into an altered plant, using techniques of 
plant growth and cultivation well-known to those skilled in the art of re-generating 
plantlets from plant cells. 

The invention also provides a method of obtaining starch from an altered plant, the plant 
being obtained by the method defined above. Starch may be extracted from the plant by 
any of the known techniques (e.g. milling). The invention further provides starch 
obtainable from a plant altered by the method defined above, the starch having altered 
properties compared to starch extracted from an equivalent but unaltered plant. 
Conveniently the altered starch is obtained from an altered plant selected from the group 
consisting of cassava, potato, pea, tomato, maize, wheat, barley, oat, sweet potato and 
rice. Typically the altered starch will have increased amy lose content. 

The invention will now be further described by way of illustrative examples and with 
reference to the accompanying drawings, in which:- 



Figure 1 is a schematic illustration of the cloning strategy for cassava SBE II. The top line 
represents the size of a full length clone with distances in kilobases (kb) and arrows 
representing oligonucleotides (rightward pointing arrows are sense strand, leftward are on 
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opposite strand). The long thick arrow is the open reading frame with start and stop 
codons shown. Below this are shown the 3* RACE, 5* RACE and PCR clones identified 
either by the plasmid name (shown in brackets above the line) or the clone number (shown 
to the left of the clone) for the 5' RACE only. Also shown (by an x) in the 5' RACE 
clones are positions of small deletions or introns. 

Figure 2 shows the DNA sequence and predicted ORF of csbe2con.seq. This sequence 
is a consensus of 3' RACE pSJ94 and 5' RACE clones 27/9, 1 1 and 28. The first 64 base 
pairs are derived from the RoRidT17 adaptor primer/dT tail followed by the SBE 
sequence. The one long open reading frame is shown in one letter code below the double 
strand DNA sequence. Also shown is the upstream ORF (MQL...LPW). 

Figure 3 shows an alignment of the 5' region of cassava SBE II csbe2con and pSJ99 
(clones 20 and 35) DNA sequences. Differences from the consensus sequence are shaded. 

Figure 4 shows the DNA sequence and predicted ORF of full length cassava SBE II tuber 
cDNA in pSJ107. The sequence shown is from the CSBE214 to the CSBE218 
oligonucleotide. 

Figure 5 shows an alignment of 3' region of cassava SBE II pSJ116 and 125+94 DNA 
sequences. The top line is the 125 + 94 sequence and the bottom SJ116 sequence. 
Identical nucleotides are indicated by the same letter in the middle line, differences are 
indicated by a gap, and dashed lines indicate gaps introduced tc optimise alignment. 

Figure 6 shows an alignment of carboxy terminal region of pSJ116 and 125+94 protein 
sequences. The top sequence is from 125+94 and the bottom from pSJ116. Identical 
amino acid residues are shown with the same letter, conserved changes with a colon and 
neutral changes with a period. 

Figure 7 shows a phylogenetic tree of starch branching enzyme proteins. The length of 
each pair of branches represents the distance between sequence pairs. The scale beneath 
the tree measures the distance between sequences (units indicate the number of substitution 
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events). Dotted lines indicate a negative branch length because of averaging the tree. 
Zfnconl2.pro is maize SBE II, psstbl.pro is pea SBE I (Bhattacharyya et al 1990 Cell 60, 
115-121) and atsbe2-l & 2-2.pro are two SBE II proteins from Arabidopsis lhalania 
(Fisher et al 1996 Plant Mol. Biol. 30, 97-108). SJ107.pro.is representative of a cassava 
SBE II sequence, and potsbe2.pro is a potato SBE II sequence known to the inventors. 

Figure 8 is an alignment of SBE II proteins. Protein sequences are indicated in one letter 
code. The top line represents the consensus sequence, below which is shown the 
consensus ruler and the individual SBE II sequences. Residues matching the consensus 
are shaded. Dashes represent gaps introduced to optimise alignment. Sequence identities 
are shown at the right of the figure and are as Figure 7, except that SJ107.pro is cassava 
SBE n. 

Figure 9 shows the DNA sequence and predicted ORF of a cassava SBE II cDNA isolated 
by 3' RACE (plasmid pSJ 101). 

Figure 10 shows the consensus DNA sequence and predicted ORF of a second cassava 
SBE II cDNA isolated by 3' and 5' RACE (sequence designated 125 + 94 is from plasmid 
pSJ125 and pSJ94, spliced at the CSBE217 oligo sequence). 

Figure 11 is a schematic diagram of the plant transformation vector pSJ64. The black line 
represents the DNA sequence. The hashed line represents the bacterial plasmid backbone 
(containing the origin of replication and bacterial selection marker) and is not shown in 
full. The filled triangles represent the T-DNA borders (RB = right border, LB = left 
border). Relevant restriction enzyme sites are shown above the black line with the 
approximate distances (in kiloobases) betwen sites marked by an asterisk shown 
underneath. The thinnest arrows represent polyadenylation signals (pAnos = nopaline 
synthase, pAg7 = Agrobacterium gene 7), the intermediate arrows represent protein 

COdincr regions (SRF. TT = ca<;<;Ava 5sRF. TT TTYO = hvcrromvrin rpcictanrp rrpnM onH thp 

- \ — — ' — "j ©- j ** — — **— — 

thick arrows represent promoter regions (P-2x35S = double CaMV 35S promoter, P-nos 
= nopaline synthase promoter). 
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Example 1 

This example relates to the isolation and cloning of SBE II sequences from cassava. 
Recombinant DNA manipulations 

Standard procedures were performed essentially according to Sambrook et al. (1989 
Molecular cloning A laboratory manual, 2nd edn. Cold Spring Harbor Laboratory Press, 
Cold Spring Harbor, N.Y.). DNA sequencing was performed on an ABI automated DNA 
sequencer and sequences manipulated using DNASTAR software for the Macintosh. 

Rapid Amplification o f cDNA ends (RACE) and PCR conditions 

5' and 3' RACE were performed essentially according to Frohman et al., (1988 Proc. 

Natl. Acad. Sci. USA 85, 8998-9002) but with the following modifications. 

For 3' RACE, 5 ng of total RNA was reverse transcribed using 5 pmol of the RACE 
adaptor RoRidT17 as primer and Stratascript RNAse H- reverse transcriptase (50 U) in 
a 50 fil reaction according to the manufacturer's instructions (Stratagene). The reaction 
was incubated for 1 hour at 37°C and then diluted to 200 pi with TE (10 raM Tris HC1, 
1 mM EDTA) pH 8 and stored at 4°C. 2.5 M l of this cDNA was used in a 25 fxl PCR 
reaction with 12.5 pmol of SBE A and Ro primers for 30 cycles of 94°C 45 sec, 50°C 
25 sec, 72 °C 1 min 30 sec. A second round of PCR (25 cycles) was performed using 1 
Ml of this reaction as template in a 50 (il reaction under the same conditions. Amplified 
products were separated by agarose gel electrophoresis and cloned into the pT7Blue vector 
(Invitrogen). 

For the first round of 5' RACE, 5 M g of total leaf RNA was reverse transcribed as 
described above using 10 pmol of the SBE H gene specific primer CSBE22. This primer 
was removed from the reaction by diluting to 500 fx\ with TE and centrifuging twice 
through a centricon 100 microconcentrator. The concentrated cDNA was then dA-tailed 
with 9U of terminal deoxynucleotide transferase and 50 M M dATP in a 20 (A reaction in 
buffer supplied by the manufacturer (BRL). The reaction was incubated for 10 min at 
37 °C and 5 min at 65 °C and then diluted to 200 (il with TE pH 8. PCR was performed 
in a 50 M l volume using 5^1 of tailed cDNA, 2.5 pmol of RoRidT17 and 25 pmol of Ro 
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and CSBE24 primers for 30 cycles of 94°C 45 sec, 55°C 25 sec, 72°C 3 min. Amplified 
products were separated on a 1 % TAE agarose gel, cut out, 200^1 of TE was added and 
melted at 99°C for 10 min. Five pX of this was re-amplified in a 50 /xl volume using 
CSBE25 and Ri as primers and 25 cycles of 94°C 45 sec, 55 °C 25 sec, 72°C 1 min 30 
sec. Amplified fragments were separated on a 1% TAE agarose gel, purified on DEAE 
paper and cloned into pT7Blue. 

The second round of 5' RACE was performed using CSBE28 and 29 primers in the first 
and second round PCR reactions respectively using a new A-tailed cDNA library primed 
with CSBE27. 

A third round of 5' RACE was performed on the same CSBE27 primed cDNA . 
Repeat 3' RACE and PCR Cloning 

The 3' RACE library (RoRidT17 primed leaf RNA) was used as a template. The first PCR 
reaction was diluted 1:20 and 1 /zl was used in a 50 ^1 PCR reaction with SBE A and Ri 
primers and the products were cloned into pT7Blue. The cloned PCR products were 
screened for the presence or absence of the CSBE23 oligo by colony PCR. 

A full length cDNA of cassava SBE II was isolated by PCR from leaf or root cDNA 
(RoRidTH primed) using primers CSBE214 and CSBE218 from 2.5 \i\ of cDNA in a 25 
y\ reaction and 30 cycles of 94 °C 45 sec, 55 °C 25 sec, 72 °C 2 min. 

Complementation of E. coli mutant KV832 

SBE II containing plasmids were transformed into the branching enzyme deficient mutant 
E. coli KV832 (Keil et aL, 1987 Mol. Gen. Genet. 207, 294-301) and cells grown on 
solid PYG media (0.85 % KH 2 P0 4 , 1.1 % K 2 HP0 4 , 0.6 % yeast extract) containing 1.0 
% glucose. To test for complementation, a loop of cells was scraped off and resuspended 
in 150 uL water to which was added 15 uL of Lugol's solution (2 g KT and 1 g L per 300 
ml water). 



RNA isolation 
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RNA was isolated from cassava plants by the method of Logemann (1987 Anal Biochem. 
163, 21-26). Leaf RNA was isolated from 0.5 gm of in vitro grown plant tissue. The 
total yield was 300 /zg. Three month old roots (88 gm) were used for isolation of root 
RNA). 



SBE II specific oligonucleotides 


SBE A 


ATGGACAAGGATATGTATGA 


CSBE21 


GGTTTCATGACTTCTGAGCA 


CSBE22 


TGCTCAGAAGTCATGAAACC 


CSBE23 


TCCAGTCTCAATATACGTCG 


CSBE24 


AGGAGTAGATGGTCTGTCGA 


CSBE25 


TCATACATATCCTTGTCCAT 


CSBE26 


GGGTGACTTCAATGATGTAC 


CSBE27 


GGTGTACATCATTGAAGTCA 


CSBE28 


AATTACTGGCTCCGTACTAC 


CSBE29 


CATTCCAACGTGCGACTCAT 


CSBE210 


T AC CGGT AATCT AGGTGTTG 


CSBE211 


GGACCTTGGTTTAGATCCAA 


CSBE212 


ATG AGTC GC AC GTTGGAATG 


CSBE213 


CAACACCTAGATTACCGGTA 


CSBE214 


TTAGTTGCGTCAGTTCTCAC 


CSBE215 


AATATCTATCTCAGCCGGAG 


CSBE216 


ATCTTAGATAGTCTGCATCA 


CSBE217 


TGGTTGTTCC CTGG AATT AC 


CSBE218 


TGCAAGGACCGTGACATCAA 



RESULTS 

Cloning of a SBE II gene from cassava leaf 

The strategy for cloning a full length cDNA of starch branching enzyme II of cassava is 
shown in Figure 1. A comparison of several SBE II (class A) SBE DNA sequences 
identified a 23 bp region which appears to be completely conserved among most genes 
(data not shown) and is positioned about one kilobase upstream from the 3' end of the 
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gene. An oligonucleotide primer (designated SBE A) was made to this sequence and used 
to isolate a partial cDNA clone by 3' RACE PCR from first strand leaf cDNA as 
illustrated in Figure 1. An approximately 1100 bp band was amplified, cloned into 
pT7Blue vector and sequenced. This clone was designated pSJ94 and contained a 1120 
bp insert starting with the SBE A oligo and ending with a polyA tail. There was a 
predicted open reading frame of 235 amino acids which was highly homologous (79% 
identical) to a potato SBE U also isolated by the inventors (data not shown) suggesting that 
this clone represented a class A (SBE II) gene. 

To obtain the sequence of a full length clone nested primers were made complementary 
to the 5' end of this sequence and used in 5' RACE PCR to isolate clones from the 5' 
region of the gene. A total of three rounds of 5* RACE was needed to determine the 
sequence of the complete gene (i.e. one that has a predicted long ORE preceded by stop 
codons). It should be noted that during this cloning process several clones (# 23, 9, 16) 
were obtained that had small deletions and in one case (clone 23) there was also a small 
(120 bp) intron present. These occurrences are not uncommon and probably arise through 
errors in the PCR process and/or reverse transcription of incompletely processed RNA 
(heterogeneous nuclear RNA). 

The overlapping cDNA fragments could be assembled into a contiguous 3 kb sequence 
(designated csbe2con.seq) which contained one long predicted ORE as shown in Figure 
2. Several clones in the last round of 5' RACE were obtained which included sequence 
of the untranslated leader (UTL). All of these clones had an ORF (42 amino acids) 46 bp 
upstream and out of frame with that of the long ORF. 

There is more than one SBE II gene in cassava 

In order to determine if the assembled sequence represented that of a single gene, attempts 
were made to recover by PCR a full length SBE II gene using primers CSBE214 and 

CSBH23 AT r^P. S* and V **nHc nf thf* rch*Orrvn c<*nn»nr.o a n 

... _ w. ~ — tjwt^uwu^w i^o^^^Li vwj . <rfc.ii aiicuiyia wcic 

unsuccessful using either leaf or root cDNA as template. The PCR was therefore repeated 
with either the 5'- or 3'- most primer and complementary primers along the length of the 
SBE II gene to determine the size of the largest fragment that could be amplified. With 
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the CSBE214 primer, fragments could be amplified using primers 210, 28, 27 and 22 in 
order of increasing distance, the latter primer pair amplifying a 2.2 kb band. With the 3' 
primer CSBE23, only primer pairs with 21 and 26 gave amplification products, the latter 
being about 1200 bp. These results suggest that the original 3' RACE clone (pSJ94) is 
derived from a different SBE E gene than the rest of the 5' RACE clones even though the 
two largest PCR fragments (214+22 and 26+23) overlap by 750 bp and share several 
primer sites. It is likely that the sequence of the two genes starts to diverge around the 
CSBE22 primer site such that the 3' end of the corresponding gene does not contain the 
23 primer and is not therefore able to amplify a cDNA when used with the 214 primer. 

To confirm this, the sequence of the longest 5' PCR fragment (214+22) from two clones 
(#20 designated pSJ99, & #35) was determined and compared to the consensus sequence 
csbe2con as shown in Figure 3. The first 2000 bases are nearly identical (the single base 
changes might well be PCR errors), however the consensus sequence is significantly 
different after this. This region corresponds to the original 3' RACE fragment pSJ94 
(SBE A + Ri adaptor) and provided evidence that there may be more than one SBE II 
gene in cassava. 

The 3' end corresponding to pSJ99 was therefore cloned as follows: 3' RACE PCR was 
performed on leaf cDNA using the SBE A oligo as the gene specific primer so that all 
SBE II genes would be amplified. The cloned DNA fragments were then screened for the 
presence or absence of the CSBE23 primer by PCR. Two out of 15 clones were positive 
with the SBE A + Ri primer pair but negative with SBE A + CSBE23 primers. The 
sequence of these two clones (designated pSJIOl, as shown in Figure 9) demonstrated that 
they were indeed from an SBE II gene and that they were different from pSJ94. However 
the overlapping region of pSJIOl (the 3' clone) and pSJ99 (the 5' clone) was identical 
suggesting that they were derived from the same gene. 

To confirm this a primer (CSBE218) was made to a region in the 3' UTR (untranslated 
region) of pSJIOl and used in combination with CSBE214 primer to recover by PCR a full 
length cDNA from both leaf and root cDNA. These clones were sequenced and 
designated pSJ106 & pSJ107 respectively. The sequence and predicted ORF of pSJ107 
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is shown in Figure 4. The long ORF in plasmid pSJ106 was found to be interrupted by 
a stop codon (presumably introduced in the PCR process) approximately 1 kb from the 3' 
end of the gene, therefore another cDNA clone (designated pSJ116) was amplified in a 
separate reaction, cloned and sequenced. This clone had an intact ORF (data not shown). 
There were only a few differences in these two sequences (in the transit peptide aa 27- 41: 
YRRTSSCLSFNFKEA to DRRTSSCLSFIFKKAA and L831 in pSJ107 to V inpSJ116 
respectively). 

An additional 740bp of sequence of the gene corresponding to the pSJ94 clone was 
isolated by 5' RACE using the primers CSBE216 and 217, and was designated pSJ125. 
This sequence was combined with that of pSJ94 to form a consensus sequence "125 + 
94", as shown in Figure 10. The sequence of this second gene is about 90% identical at 
the DNA and protein level to pSJ116, as shown in Figure 5 and 6, and is clearly a second 
form of SBE II in cassava. The 3' untranslated regions of the two genes are not related 
(data not shown). 

It was also determined that the full length cassava SBE II genes (from both leaf and tuber) 
actually encode for active starch branching enzymes since the cloned genes were able to 
complement the glycogen branching enzyme deficient E. coli mutant KV832. 

Main Findings 

1) A full length cDNA clone of a starch branching enzyme n (SBE II) gene has been 
cloned from leaves and starch storing roots of cassava. This cDNA encodes a 836 amino 
acid protein (Mr 95 Kd) and is 86 % identical to pea SBE I over the central conserved 
domain. 

2) There is more than one SBE II gene in cassava as a second partial SBE II cDNA was 
isolated which differs slightly in the protein coding region from the first gene and has no 

hnmnlncrv in th«» 1' 

OJ -~~~*Wll*WWW. ^^lUU. 



3) The isolated full length cDNA from both leaves and roots encodes an active SBE as 
it complements an £. coli mutant deficient in glycogen branching enzyme as assayed by 
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iodine staining. 

We have shown that there are SBE II (Class A) gene sequences present in the cassava 
genome by isolating cDNA fragments using 3* and 5' RACE. From these cDNA 
fragments a consensus sequence of over 3 kb could be compiled which contained one long 
open reading frame (Figure 2) which is highly homologous to other SBE II (class A) genes 
(data not shown). It is likely that the consensus sequence does not represent that of a 
single gene since attempts to PCR a full length gene using primers at the 5' and 3' ends 
of this sequence were not successful. In fact screening of a number of leaf derived 3' 
RACE cDNAs showed that a second SBE H gene (clone designated pSJIOl) was also 
expressed which is highly homologous within the coding region to the originally isolated 
cDNA (pSJ94) but has a different 3' UTR. A full length SBE II gene was isolated from 
leaves and roots by PCR using a new primer to the 3' end of this sequence and the 
original sequence at the 5' end of the consensus sequence. If the frequency of clones 
isolated by 3' RACE PCR reflects the abundance of the mRNA levels then this full length 
gene may be expressed at lower levels in the leaf than the pSJ94 clone (2 out of 15 were 
the former class, 13/15 the latter). It should be noted that each class is expressed in both 
leaves and roots as judged by PCR (data not shown). Sequence analysis of the predicted 
ORF of the leaf and root genes showed only a few differences (4 amino acid changes and 
one deletion) which could have arisen through PCR errors or, alternatively, there may be 
more than one nearly identical gene expressed in these tissues. 



A comparison of all known SBE II protein sequences shows that the cassava SBE II gene 
is most closely related to the pea gene (Figure 8). The two proteins are 86.3% identical 
over a 686 amino acid range which extends from the triple proline "elbow" (Burton etal., 
1995 Plant J. 7, 3-15) to the conserved WYA sequence immediately preceding the C- 
terminal extensions (data not shown). All SBE II proteins are conserved over this range 
in that they are at least 80% similar to each other. Remarkably however, the sequence 
conservation between the pea, potato and cassava SBE II proteins also extends to the N- 
terminal transit peptide, especially the first 12 amino acids of the precursor protein and 
the region surrounding the mature terminus of the pea protein (AKFSRDS). Because the 
proteins are so similar around this region it can be predicted that the mature terminus of 
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the cassava SBE II protein is likely to be GKSSHES. The precursor has a predicted 
molecular mass of 96 kD and the mature protein a predicted molecule mass of 91.3 kD. 
The cassava SBE II has a short acidic tail at the C-terminal although this is not as long or 
as acidic as that found in the pea or potato proteins. The significance of this acidic tail, 
if any, remains to be determined. One notable difference between the amino acid 
sequence of cassava SBE II and all other SBE II proteins is the presence of the sequence 
NSKH at around position 697 instead of the conserved sequence DAD/EY. Although this 
conserved region forms part of a predicted a-helix (number 8) of the catalytic (fi/a) 8 barrel 
domain (Burton et al 1995 cited previously), this difference does not abolish the SBE 
activity of the cassava protein as this gene can still complement the glycogen branching 
deletion mutant of E. coli. It may however affect the specificity of the protein. An 
interesting point is that the other cassava SBE II clone pSJ94 has the conserved sequence 
DADY. 



One other point of interest concerning the sequence of the SBE II gene is the presence of 
an upstream ATG in the 5 ' UTL. This ATG could initiate a small peptide of 42 amino 
acids which would terminate downstream of the predicted initiating methionine codon of 
the SBE II precursor. If this does occur then the translation of the SBE II protein from this 
mRNA is likely to be inefficient as ribosomes normally initiate at the 5 ' most ATG in the 
mRNA. However the first ATG is in a poorer Kozak context than the SBE II initiator and 
it may be too close to the 5' end of the message to initiate efficiently (14 nucleotides) thus 
allowing initiation to occur at the correct ATG. 

In conclusion we have shown that cassava does have SBE II gene sequences, that they are 
expressed in both leaves and tubers and that more than one gene exists. 

Example 2 

Construction of plant transformation vectors and transformation of cassava with 
Bntisgngg stErch branching gngyme geiigs. 

This example describes in detail how a portion of the SBE II gene isolated from cassava 
may be introduced into cassava plants to create transgenic plants with altered properties. 



An 1100 bp Hind III - Sac I fragment of cassava SBE II (from plasmid pSJ94) was cloned 
into the Hind III - Sac I sites of the plant transformation vector pSJ64 (Fig 11). This 
placed the SBE II gene in an antisense orientation between the 2X 35S CaMV promoter 
and the nopaline synthase polyadenylation signal. pSJ64 is a derivative of the binary 
vector pGPTV-HYG (Becker et al., 1992 Plant Molecular Biology 20: 1195-1197) 
modified by inclusion of an approximately 750 bp fragment of pJIT60 (Guerineau et al 
1992 Plant Mol. Biol. 18, 815-818) containing the duplicated cauliflower mosaic virus 
(CaMV) 35S promoter (Cabb-JI strain, equivalent to nucleotides 7040 to 7376 duplicated 
upstream of 7040 to 7433, as described by Frank et al. , 1980 Cell 21 , 285-294) to replace 
the GUS coding sequence. A similar construct was made with the cassava SBE II 
sequence from plasmid pSJIOl. 

These plasmids are then introduced into Agrobacterium tumefaciens LBA4404 by a direct 
DNA uptake method (An et al, Binary vectors, In: Plant Molecular Biology Manual (ed 
Galvin and Schilperoort) AD 1988 pp 1-19) and can be used to transform cassava somatic 
embryos by selecting on hygromycin as described by Li et al. (1996, Nature 
Biotechnology 14, 136-140). 
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Claims 

1. A nucleic acid sequence encoding a polypeptide having starch branching enzyme 
(SBE) activity, the encoded polypeptide comprising at least an effective portion of amino 
acid residues 1-836 of the sequence shown in Figure 4. 

2. A nucleic acid sequence according to claim 1, encoding a polypeptide comprising 
amino acid residues 1-836 of the sequence shown in Figure 4. 

3. A nucleic acid sequence according to claim 1, comprising nucleotides 21-2531 of the 
nucleic acid sequence shown in Figure 4, or a functionally equivalent nucleotide sequence 
which hybridises under stringent hybridisation conditions with the nucleic acid sequence 
shown in Figure 4. 

4. A nucleic acid sequence according to any one of claims 1, 2 or 3 comprising a 5' 
and/ or a 3' untranslated region. 

5. A nucleic acid sequence according to any one of the preceding claims, encoding a 
polypeptide having the amino acid sequence NSKH at about residue 697. 

6. A nucleic acid sequence comprising at least 200bp and exhibiting at least 90% 
sequence identity with the corresponding region of the DNA sequence shown in Figures 
4, 9 or 10, operably linked in the sense or anti-sense orientation to a promoter operable 
in plants. 

7. A nucleic acid sequence according to claim 6, comprising at least 300-600bp. 

8. A sequence according to claim 6 or 7, comprising a 5'and/or 3 'untranslated region. 

9. A sequence according to claim 8, comprising nucleotides 688-1044 of the sequence 
shown in Figure 9, and/or nucleotides 1507-1900 of the sequence shown in Figure 10. 
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A sequence according to claim 6, comprising the nucleotide sequence shown in Figure 

A replicable nucleic acid construct comprising a nucleic acid sequence according to 
any one of the preceding claims. 

12. A polypeptide having SBE activity and comprising an effective portion of amino acid 
residues 1-863 of the amino acid sequence shown in Figure 4. 

13. A polypeptide according to claim 12, in substantial isolation from other polypeptides. 

14. A polypeptide according to claim 12 or 13, having the amino acid sequence NSKH 
at about position 697. 

15. A method of modifying starch in vitro, the method comprising treating starch to be 
modified under suitable conditions with an effective amount of a polypeptide according to 
any one of claims 12, 13 or 14. 

16. A method of altering a plant host cell, the method comprising introducing into the cell 
a nucleic acid sequence comprising at least 200bp and exhibiting at least 90% sequence 
identity with the corresponding region of the DNA sequence shown in Figures 4, 9 or 10, 
operably linked in the sense or anti-sense orientation to a suitable promoter active in the 
host cell, and causing transcription of the introduced nucleotide sequence, said transcript 
and/or the translation product thereof being sufficient to interfere with the expression of 
a homologous gene naturally present in the host cell, which homologous gene encodes a 
polypeptide having SBE activity. 

17. A method according to claim 16, wherein the host cell is from a cassava, banana, 
potato, pea, tomato, maize, wheat, barley, oat, sweet potato or rice plant. 

18. A method according to claim 16 or 17, comprising the introduction of one or more 
further nucleic acid sequences, operably linked in the sense or anti-sense orientation to a 
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suitable promoter active in the host cell, and causing transcription of the one or more 
further nucleic acid sequences, said transcripts and/or translation products thereof being 
sufficient to interfere with the expression of homologous gene(s) present in the host cell. 

19. A method according to claim 18, wherein the one or more further nucleic acid 
sequences interfere with the expression of a gene involved in starch biosynthesis. 

20. A method according to claim 18 or 19, wherein the further nucleic acid sequence 
comprises at least part of an SBE I gene. 

21 . A method according to claim 20, wherein the further nucleic acid sequence comprises 
at least part of the cassava SBE I gene. 

22. A method according to any one of claims 16-21, wherein the host cell is selected 
from one of the following: cassava, banana, potato, pea, tomato, maize, wheat, barley, 
oat, sweet potato or rice. 

23. A method according to any one of claims 16-22, wherein the altered host cell gives 
rise to starch having different properties compared to starch from an unaltered cell. 

24. A method according to any one of claims 16-23, further comprising the step of 
growing the altered host cell into a plant or plantlet. 

25. A method of obtaining starch having altered properties, comprising growing a plant 
from an altered host cell according to the method of claim 24, and extracting the starch 
therefrom. 

26. A plant or plant cell into which has been artificially introduced a nucleic acid 
sequence comprising at least 200bp and exhibiting at least 90% sequence identity with the 
corresponding region of the DNA sequence shown in Figures 4, 9 or 10, operably linked 
in the sense or anti-sense orientation to a promoter operable in plants, or the progeny 
thereof. 
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27. A plant according to claim 24, altered by the method of any one of claims 16-22. 

28. Starch obtainable from an altered plant according to claim 26 or 27, having altered 
properties compared to starch extracted from an equivalent but unaltered plant. 

29. Starch obtained from an altered plant according to claim 26 or 27, having altered 
properties compared to starch extracted from an equivalent but unaltered plant. 

30. Starch according to claim 28 or 29 obtained from an altered plant selected from the 
group consisting of:- cassava, banana, potato, pea, tomato, maize, wheat, barley, oat, 
sweet potato and rice plants. 

3 1 . Starch according to any one of claims 28, 29 or 30, having increased amylose content 
compared to starch extracted from an equivalent but unaltered plant. 
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ABSTRACT 



Title : Improvements in or Relating to Starch Content of Plants 

Disclosed is a nucleic acid sequence encoding a polypeptide having starch branching 
enzyme (SBE) activity, the encoded polypeptide comprising an effective portion of amino 
acid residues 1-836 of the sequence shown in Figure 4. 
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TATGGATTGACATCGATAATACGACTCACTA TAGGGATTTCTTTTTTTTTCTTTTTGMTTTT^ 

ATACCTAACTGTAGCTATTATGCTGAGTGATATCCCTAAAGAAAAAAAAAGAAAAACNAAAAATTTTTTicAACTTGTACGTTAA 120 

MOLVASVLTLSLTS 



NCO I 



2<*0 



AGCGAAATGGGACACTACACCATATCAGGAATACGTTTTCCTTGTGCTCCACTCCGCAAATCTCAATCTACCGGCTT^ 
TCCCTTTACCCTGTGATCTCCTATAGTC^ 

0^ R N G T L H H [ RNTFSLCSTPQ 1 S I YRLPW 
'" ^ " G H Y T ' 5 G 1 * F P C A P L R K S Q S T G F H G 0 R R T S S C L S F N F 

■^GGCGTTTTCTAGGAGGGTCTTCTCTGGAAAGTCATCTCATGAATCTGACTCCTCAAATG 
TTC TTCCGCCGCAAAAGATCCTCCCAGAAGAG^ 360 

K K A A F S R R V F S G K 3 5 H E S 0 S S N V M V T A S K R V L P 0 G R I £ C Y 
TC TTC T TC A AC AG ATC A A TTG G A AGCCCCTGGC AC AG 

AGAAGAAGTTGTCTAGTTAACCTTCGGGGACCGTGTCAAAGTCTTCTTAGGGTCCACGAATGACTACAACTCTCAGAGTAATAC^ 480 
3 3 3 T ° ° L E A ? G T V S E E S Q V L T D V E S L I M D 0 K , v E D E V N K E 
Xmn I 

Hind III 

T CTGTTCCAATCCGGGAGACAGTTAGCATCGGAAAAATTCGATCTAAACCAAGCTCCATTCCTCCACCCGGCAGA 
-ACAAGGTTACGCCCTCTGTCAATCGTAKCTTTTTAAicTAGATTTGCTTCCACGTAAGGAGGTGGGCCGTCTCCCGT^ 600 
3 7 P M R E T 7 S 1 G K 1 6 S K P R S I P P P G R G Q R I Y 0 [ 0 P S L T G F R 

CAACACCTAGATTACCGGTATTCACAGTACAAAAGACTCCGAGAAGAAATTGACAAGTATGAAGGTAGTCTGGATGCATTTTCTCGTGGC 

GTTG7GG ATCTAATGGCCATAAGTGTCATGTTTTCTGAGGCTCTTCTT^ 720 
° H L ° Y * Y S Q Y K * L R E E ■ 0 '< V E G S L D A F S R G Y E K F G F S R S E 

Sgl II 

ACAGGAA7A AC77ATAGAGAG7GGGCACCAGGAGC7ACG7GGGC *^^ATTG ATTGGAGATTTC AATAACTGGAATCCTAATGCAGATGTCATGACTCAGAATGAGTGTGGTGTCTGGGAG 
TGTCC IT ATTGAATATCTCTC ACCCGTGGTCCTCGATGCACCCGACGTAAC ^A^CCTCTAAAGTTATTGACCTT AGGATTACGTCTACAG7 ACTGAGTC IT ACTC ACACC ACAGACCCTC ^ 
T G [ 7 Y R E W A P G A 7 W A A L I G 0 F N N V N P N A C V M T 0 N £ C G V V E 

Nco I Xho I 

-""■TG9CGAATAATGCAGA7GGTTCACCACCAA77CCCCA7GG-7C7CGAG7AAAG^ 

7AGAAAAACGGC77A77ACG7C7ACCAAG7GG-GG77AAGGGG7ACCAAGAGC7CA777C7A7GCG7ACC7A7GAGG7AGACCG77G777C7AAGA7AAGGACGAACC7AG77CAAGAG7 ^ 
iFLPNNAD GSP?IPHGSRVKlRMD7PSGN<0S;PAVIKFS 

^7 7CAAGCACCAGG7GAAC7CCCA7A7AATGGC-A7ACiA7GA7CC7CCCGAGGAGGAGAAGTA7G7G7TCAAAAA7CC7CAGCCAAAGAGACCAAAA7CAC77CGGA77 7A7GAG7CG 
.AC"CG7GG7CCAC77GAGGG7A7A77ACC:-ATA7GA7AC7AGGAGGGC7CC7CC7C77CA7ACACAAG77777AGGAG7CGG777C7C:GG7777AG7GAAGCCTA-A7AC7CAGC 



p gelpyng:yydppe 
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N de 1 Hind III 

CACG77GGAA7GAG7AG7ACGGAGCCAG7AA7--ACACA7A7GCCAAC777AGAGA7GA7G7G— ^ 

G7GCAACC-7AC7CA7CA7GCC7CGG7CA77A,-TG7G7A7ACGG77GAAA7C7C7AC7ACACGAAGGAGCG7AG77777CGAACCGA7G77ACGACA i G7CGAG7ACCG ,2 °° 
* 7 G M S 5 T E P V : N T V A N F R 0 0 V L P R I K < L G Y M A V Q L M A I Q r 

•--'"■"^ 7 " T "^^GC7AG77T7GGG7A7CACG-CACAAAC7T-TA 

G-A*G-Ar-'TACGA7CAAAACCCA7AG7GC-;-3-77GA-AATACGrCGA-r GTCGGC ^ 

= / v A5hGYH\ ~ N - Y A A 3 S R F G 7PDQlKSLVDKAHELGL'_ 

NlSi i 

^--'--A-GC^7A7T G iTCATAGCCA7GCA7Ci-:7AA7iCG7-GGA7GGGC7GAA7A^G77- G -7GG7ACGGA7GG7CAC-AC-77CAC-C^ 



:7a7AACAAG7A7CGG7a:g7---GA7Ta-g:aaCC7ACCCGAC77A7aCAA1C-ACCA7GCC7ACCAG-GA'GAAAG-GAGACCTGG7GC:CCAG:AG--A 
° ' 7 H 3 H A 5 " N " S- D r ' L N M F * G 7 0 G H P h ; G p S G H r w M w 
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Ha- 2. (cor,.) 3 /llo 

O 

A7C7ACACCCA7CA7GGA77GCAGG7AGA7T7CACCGGCAAC7ACAA7GAA7AC777GGA7A7CCAAC7GA7G7 AGA7GC7G7GG777A7C7 

TACATGTGGGTAGTACCTAACGTCCATCTAAAGTGGCCGTTGATG7TACTTATGAAACCTATACGTTGACTACATCTACGACACCAAA7AGACTACGACAACTTACTATACTAAGTACCA 1680 
MY7HHGLQVDF7GNYNEYFGYA7DV0AVVYL.1LLNDM t HG 

C7C77CCCAGAGGCTG7CACCA77GGTGAAGA7G77AG7GGAATGCCAACAG777GCA77CCGG77GAAGA7GG7GG7G77G GC777GA7TATCG7C7CCACA7GGC7G77GC7GA7AAA 
CAGAAGGG7C7CCGACAG7GG7AACCAC77C7ACAA7CACC77ACGG77G7CAAACG7AAGGCCAAC77C7ACCACCACAACCGAAAC7AA7AGCAGAGG7G7ACCGACAACGAC7A777 ^ 
t-f r PEAV7!GE0VSGMP7VCIPVE0GGVGFDYRLHMAVA0K 

Ndel 

7 GGG77GAGA77A77CAGAAGAGAGA7GAAGA77GGAAAA7GGG7GACA77G7AC A7A7GC TGACCAACAGGCGG7GG77GGAAAAG7G7G7 
./."AACTCTAATAAGTCTTCTCTCTACTT^^ 1920 

,' /EIIQi<RD ^0 W <"G0IVHML7NRRWLEKCVSYAESH0QA 

C7TG77GGTGACAAAAC7A77GCA7777GGC7GA7GGACAAGGA7A7G7A7GAC77CA7GGC7CG7GACAGACCA7C7AC7CC7C77A7AGA7CG 7GGAA7AGCA77GCACAAAA7GA7C 
GAACAACCAC7G7777GA7AACG7AAAACCGAC7ACC7G77CC7A7ACA7AC7GAAG7ACCGAGCAC7G7C7GGTAGA7GAGGAGAA7A7C7AGCACC77A7CG7AACG7G77T7AC7AG 
LVGDKTlAFVLMDKOMYOFMARORPSTPLIDRGlALHKfii 

NCO I 

AGGC77A77ACCA7GGGC77AGGCGGAGAAGGA7A777GAA7777A7GGGAAA7GAA777GGACA7CC7GAG7GGA77GA7777CCAAGAGGGGA7CGACA7C7GC CCAA7GG7AAAG 

7CCGAA7AA7GG7ACCCGAA7CCGCC7C77CC7A7AAAC77AAAA7ACCC777AC77AAACC7G7AGGAC7CACC7AAC7AAAAGG77C7CCCC7AGC7G7AGACGGG77ACCA777CA7 2160 
RL f 7MGLGGEGYLNFMGNEFGHPEWI DFPRGDRHLPNGKV 

" .EcoRV 

A77CCAGGGAACAACCACAG77A7GA7AAA7GCCG7CG7AGA777GA7C7AGG7GA7GCAGAC7A7C7AAGA7'a7CA7GGA A7GCAAGAG7T7GA7CAGGCAA7GCAACA7C77GAAGAA 
7AAGG7CCC77G77GG7G7CAA7AC7A777ACGGCAGCA7C7AAAC7AGA7CCAC7ACG7C7GA7AGA77C7A7AG7ACC77ACG77C7CAAAC7AG7CCG77ACG77G7AGAAC77C77 
[PGNNH SYOKCRRRFDLGOAOYLRYHGHQcFDQAMQHLEE 

GCC7A7GG777CA7GAC77C7GAGCACCAG7A7A7A7CACGGAAGGA7GAAGGAGA7CGGA7CA77G7C777GAGAGGGGAAACC77G77 777G7A77CAAC777CA77GGAC7AACAGC 

CGGA7ACCAAAG7AC7GAAGAC7CG7GG7CA7A7A7AG7GCC77CC7AC77CC7C7AGCC7AG7AACAGAAAC7C7CCCC777GGAACAAAAACA7AAG77GAAAG7AACC7GA77G7CG 
AYGFM73EHQY t SRKOEGOR I IVFERGNLVFVFNFHVTNS 



7A77CAGA77ACCGAG77GGC7GC77CAAG7CAGGAAAG7ACAAGA77G7777GGAC7CGGA7GA7GGC77G777GGAGGC77 CAACAGGC77AG7CA7GA7GCCGAGCACT7CACC777 

A7AAG7C7AA7GGC7CAACCGACGAAG77CAG7CC777CA7G77C7AACAAAACC7GAGCC7AC7ACCGAACAAACC7CCGAAG77G7CCGAA7CAG7AC7ACGGC7CG7GAAG7GGAAA 252 ° 
YSOYRVGCFKSGKYK [ V t-DSDDGLFGGFNRL5H0AEHF7F 

GACGGG7GG7A7GA7AACCGGCC7CGG7CC77CA7GG7A7A7GCACCA7C7AGGACAGCAG7GG7CCA7GC777AG7AGA AGA7GAAGAGAA7GAAGCAGAGAA7GAAG7AGAAAG7GAA 
C7GCCCACCA7AC7A77GGCCGGAGCCAGGAAG7ACCA7A7ACG7GG7AGA7CC7G7CG7CACCAGG7ACGAAA7CA7C77C7AC77C7C77ACT7CG7C7C77AC77CA7C777CAC77 
0GWY 0N3PRSF«VYAPSR7AVVHALVEDEENEAENEVESE 

BamH I Hinc II 

G 7GAAACCAGCC7CCGGC7GAGA7AGA7A777AG7AAGAGGA7CCCC7AAAGCAGGAA7GG77AACC7G7GCA7C7GCA77GAACGA CG7A7A7TGAGAC77GAA77GA7-7GC7GC-CA 

..C777GG7CGGAGGCCGAC7C7A7C7A7AAA7CA77C7CC7AGGGGA777CG7CC77ACCAA77GGACACG7AGACG7AAC77GC7GCA7ATAAC7C7GAAC77AAC7AAACGACGAG7 
V K P A S G 

Ssp 1 Nsi ( Nde J 

GGACACAGAA7A77AA77CCAAGGC7CAAGGCAGAGA7ACACGCCA7AA7GCA7GA7CA7A7GAA-G C7CCCCAAC77G7AAATCA-TTAGCAAGC7GCG7GCAC7C7G;a j iA77A7ATG 
CC7G-G-C77A7AA77AAGG7TCCGAG77CCG7C7C7A7GTGCGG7A77ACG7AC7AG7A-AC77-CGAGGGG77GAACA777AG7AAA7CG7-CGACGCACG7GAGACA777AAI ATir 

Sea I Nco I 

T& GrAC777GGCAAG7CACG77ATTArGGA7ACCA7GGA7G7CCGC7AGGAAAAA7777G-G7A-ACGCC7ACTACGA77T77AAA7C-CG 
-"CA-GAAiCCC77CAG7GCAA7AA7ACC-A7GG7ACC7ACACGCGA7CC77777i^ 
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Nco I 

CTCTCTAACTTCTCAGCGAAATGGGACACTACACCATATCAGGAATACGTTT7CCTTGTGCTC 

GAGAGAT7GAAGAGTCGCTTTACCCTCTGA7GTGGTATAGTCCTTATGCAAAAGGAACACGAGGTGAGACCTTTAGAGTTAGATGGCCG ' 2 ° 

MGH YTISG IRFPCAPLCKSOSTGFHGYRRTSSC 

TTTCCTTCAACTTCAAGGAGGCGTTTTCTAGGAGGGTCTTCTCTGGAAAGTCATCTCA7GAATC7GACTCCTCAAATG TAATGGTCACTGCTTCTAAAAGAGTCCTTCCTGATGGTCGGA 
AAAGGAAG77GAAGT7CC7CCGCAAAAGA7CCTCCCAGAAGAGACCT77CAG7AGAG7AC77AGAC7GAGGAG777ACA7TACCAG7GACGAAGA7777C7CAGGAAGGAC7ACCAGCC7 ^ 

S FNFKEAFSRRVFSGKSSHES0SSNVMV7ASKRVLP0GR 

■ '' ■A7GC7A77C7TC77CAACAGA7CAA77GGAAGCCCC7GGCACAG777CAGAAGAA7CCCAGG7GC77AC7GATG77GAGAG7C7CA77ATGG A7GA7AAGA77G77GAAGA7GAAG 

'' i -«acttacgataagaagaagttgtctagttaaccttcggggaccgtgtcaaagtcttcttagggtccacgaatgactacaactctcagagtaatacctacta,ttctaacaacttctacttc 360 

IECYSSST-OOLEAPGTVSEESOVLTOVESLIMDDKIVEDE 

; * mn 1 Hind III 

TAAATAAAGAATCTGTTCCAATGCGGCA 

A777AT77CT7AGACAAGG77ACGCCC7C7G7CAA7CG7AG7C777T7AACC7AGA777GG77CCAGG7AAGGAGGTGGGCCG7C7CCCG777C77ATATAC7G7A7C7AGGT7CGAACT ^ 

VMKESVPMRETVSIRKIGSKPRSIPPPGRGORIYDIOPSL 
Hinc II _ - Nsi , 

CAGGCT7TCG7CAACACC7AGA7TACCGG7A77CACAGTACAAAAGAC7CCGAGAAGAAA7TGACA AG7A7GAAGG7AG7CTGGA7GCA777TC7CG7GGC7A7GAAAAG77TGG777C7 

1 * ' • i i ■ i . i ] ii.j.. ; . i . i | , . j . , | | | cqq 

G7CCGAAAGCAG77GTGGA7C7AA76GCCA7AAG7G7CA7G7777C7GAGGC7C77C777AACTG7TCA7AC77CCATCAGACC7ACG7AAAAGAGCACCGA7ACTT77CAAACCAAAGA 
TGFRQHLOYRYSQYKRLREE IDKYEGSLOAFSRGYEKFGF 

CACGCAG7GAAACAGGAA7AACT7ATAGAGAG7GGGCACCAGGAGC7ACG7GGGC7GCA77GA77GG AGA777CAATAAC7GGAA7CCTAA7GCAGATG7CA7GAC7CAGAA7GAG7G7G 
G7GCG7CAC777G7CC77A77GAA7A7C7C7CACCCG7GG7CC7CGA7GCACCCGACG7iAC7AACC7C7AAAG77A77GACC77AGGA77ACG7CTACAG7AC7GAG7C77AC7CACAC 

SRSETG I TYREWAPGATWAiL ; GDFNN WNPNAOVHTQ.NEC 

Bgl fl NCO I Xho I 

G7G7C7GGGAGA7C77777GCCGAA7AA7GCAGA7GG77CACCACCAA77CCCCA7GG77C7CG-G7AAAGATACGCA7GGA7AC7CCA7C 7GGCAACAAAGA77C7A77CC7GC77GGA 
CACAGACCC7C7AGAAAAACGGC77AT7ACG7C7ACCAAG7GG7GG7TAAGGGGTACCAAGAGC7CA777CTATGCGTACCTATGAGG7AGACCGT7G77TC7AAGA7AAGGACGAACC7 

GVwEIF t_PNNADGSPP:PHGS = VK[Rn07PSGNK0SIPAW 

7CAAG7-C7CAG77CAAGCACCAGG7GAAC7CCCA7A7AA7GGCA7A7AC7A7GATC:TCCCGAGGAGGAGAAG7A7G7G77CAAAAA7CC 7CAGCCAAAGAGACCAAAA7CAC77CGGA 
AG77CAAGAG7CA"AG77CG7GG7CCAC77GAGGG7A7A77ACCG7A7A7GA7AC7AGGAGGGC7C;7CC7C77CA7ACACAAG77777AGGAG7CGG777C7C7GG77T7AG7GAAGCCT 



720 



8«0 



+ 960 



KFSVQAPGELFYNG: v C P 



E<YVFKMPQPKRPK3LR 



Hind III 

777A7GAG7CGCACG77GGAA7GAG7aG7ACGGAGCCAG7AA77AACACA'a7g:Caa;--:aGA-:a-GA7G7GCT7CC7CGCA7!:aaaaaGC 77GGC7ACAATGC7G77CAGCTCA7GG 
AAA7AC7CAGCG7GCAACC77AC7CA7CA7GCC7CGG7CAT7AA77G7C7A7ACCG-7GAAA7r-;7 AC 7ACACGAAGGAGCG7AG77777CGAACCGA7G77ACGACAAG7CGAG7ACC 

E V ESHVGM3S7EPVI N7rA-:- = o 0 vL?R:<KLGYNAVCLn 
C7A77CAAGAGCA77CA7AT7A7GC7AG777 7GGG7A7CACG7CACAAaC7 7'7a7G:aGC7ag:iGCCGA7"GGAAC-CC7GA:GA777AAAG7C7C7AA7AGATAAAG:7CACGAG7 



"C7CG7AAG7A7AA7ACCA7CaaaaCCCA7AG7GCAG7G7 77GAAAA7ac:--:GA -C3-:GGC7aaaCCT7GAGGaC-AC7AAA777CAGAGAT7A7CTA777CGAG7GC7CA 
E-3YYA5-3 v Hvr-, --i-;3R-C-?C0LKSL ! D < A H E 

NSi I 

~C77G7TC~CA7GGA7A7"G7~CA~AGCCA7GCA7CAACTAA7ACG7"jji'jGGt"jAi*A7G"77GA7GG7 ACGGA7GG7CAC7AC77 7CAC 7C 7GGACC AT GGGG~C A 7C 
aGAA.:aaGAG7ACC7A7aaCAAG7a-CGG7ACG 7 AG7-GA7-A7GCAAC Z'^ZZZZ - 1 "-7AC1 A AC7ACC A7QCC 7 ACC AG7GA TGAAAG7GACACC TGGTGCCCC AG7AG 

I- L M D I V K 5 H A S 7 f ; * ; - . m r D G - 0 , : h v F H 3 G - s G >r , 

, GGG-C CTCGCC777~C aaC7a-:gcacC7GGGAGC77C -aagg'-': -~: :---"CC AAGG7GG7GC'7GGA'GAG7ACAAG7rTGA7GGGTTCAGA"'GA*GGGG 
""■-^"S i CACCCGAAAAGT7GA7ACCC7CCACCCTCCAA^^ 



L F N - G S 



f n c 



% 



# 



pr 9 4 /it 

T «">£«AT 6 ATCTAC"CC«TC«T CC ATT C ^ 

* Cre *« TT «TACATCT G C 6 TACTACCTAAC 6 TCCATCTAAAAT C GCcdTTGATGTTACTTATCAAACCTATACCTTCACTACATCTACCACACCAAA ' 56 ° 
" T S M " Y T " " C L 0 V D F T G N * N 6 y F c Y A T 0 V D A V V Y L „ L L N 0 

TGATTCATGGTCTCTTCCCAGAGGCTGTCACCA7 TGGTGAAGATGTTAGTGGAATGCCAACAGT7TGCATTCCGGTTGAAGATGGTGGTGTTGGC7TTGA TTATCGTCTCCACATGGCTG 

actaagtaccagagaagggtctccgacagtggtaaccacttctacaatcaccttacggtIgtcaaacgtaaggccaacttctaccaccacaaccgaaactaatagcagaggtgtaccgac ' 68 ° 

M ^ HGLFPEAVT ! G E 0 V SGMPTVC I PVED GGVGFOYRLHMA 
. TTGCTGATAAATG GC T n:AGATTATTCAGAAG AGAGA^ 

..^■•^ctatttacccaac .'ctaataagtct7ctctctacttctaaccttttacccactg!aacatg:atacgactgcttgtccgccaccaaccttttcacacaaagaatacgactttcag 1800 

:^K:;'- OKWVE I I OKROEDWKMGO I "HM LTNRRWLEKCVSYAES 

atgaccaggcccttgttggtgacaaaactatt gcattttggctgatggacaaggatatgtatgacttcatggctcttgacagaccatctactcctctcatagatc gtggagtagcattgc 

TACTGGTCCGGGAACAACCACTGTTTTGATAACGTAAAACCGACTACCTGTTCCTATACATACTGAAGTACCGAGAACTGTCTGGTAGATGAGGAGAGTATCTAGCACC7CATCGTAACG 
" ° ° A L V G ° K T ! 4 F W «■ » ° K 0 M Y 0 F « A L 0 R P s 7 P L I 0 R G V A L 
Bel I Nco I 

ACAAAA7GA7CAGGCT7A77ACCA7GGGA 77AGGCGGAGAAGGATA7T7GAA7777A7GGGAAATGAA77TGGACACCCCGAG7GGA77GATT7TCCAAGAGGTGATC7A CA7C77CCCA 
TG7777AC7AG7CCGAATAA7GG7ACCC7AA7CCGCC7CT7CC7ATAAAC77AAAA7ACCC777AC77AAACCTGTGGGGC7CACC7AAC7AAAAGGT7C7CCAC7AGA7G7AGAAGGG7 

• X " ' " L ' T M G L G G E G_ Y N F M G N E F G H P E W I 0 F P R G D L H L P 

EcoR V Bel I 

G7GG7AAA777G77CC7GGGAACAA77AC AG77A7GA7AAA7GCCGGCGTAGG777GA7C7AGGCAA77CAAAGCA7C7GAGA7A7CA-GGAA7GCAAGAGT7TGATCA AGCAAT7CAGC 
CACCA777AAACAAGGACCC77G77AA7G7CAATAC7A7TTACGGCCGCA7CCAAAC7AGA7CCG77AAG77TCG7AGAC7C7A7AG7ACC77ACG77C7CAAAC7AGT7CG77AAG7CG ^ 

5 G K F V P G N N Y 3 Y 0 * C " « " F 0 L G N S K H L R Y H G „ Q E F 0 0 A I Q 

A7C77GAAGAAGC CTA7GG777CA7GACT7C7GAGCACCAA7ACATA7CACGGAAGGA7GAAAGGGA7CGGA7CA77G7C77CGAGAGGCGAAACC7CG77777G7A77CAATT T7CATT 
7AGAAC7TC77CGGA7ACCAAAG7AC7GAAGAC7CG7GG77A7G7A7AG7GCC77CCTAC777CCC7AGCC7AGTAACAGAAGC7CTCCCC777GGAGCAAAAACA7AAG7TAAAAG7AA ^ 

H L E E A Y G F M T S £ H Q Y ' S R K D E R 0 R I . V F E R G N L V F V F N F H 

GGAC7AGCAGC7A77CGGA77ACCGAG7 7GGC7GC77AAAGCCAGGAAAG7ACAAGA7AG7C77GGA77CAGATGA7CCT77G777GGAGGC777GGCAGGC77AG7CA7GA7GC AGAGC 
CC7GA7CG7CGA7AAGCC7AA7GGC7CAACCGACGAA777CGG7CC7T7CA7G77C7A7CAGAACC7AAG7C7AC7AGGAAACAAACC7CCGAAACCG7CCGAATCAG7AC7ACG7C7CG ^ 

W 7 5 5 ' 3 ° Y " V G C L K P G * Y < I » L D S 0 0 P I F G G F G R L S H D A r 

AC7TCAG:7TTGAAGGGTGG7ACGA7AA CCGGCC7CGA7CC-7CATGG7G7ACACACCA-G7ACAAC.GCAG7GG7C7A-CC777AGT;CAGGA7GAAG7GGAGAA7GAA77GGAACC7G 
7GAAG7CGAAAC77CCCACCA7GC7A77GGCCGGAGC7AGGAAG7ACCACA7G7G7GG7ACA7C77G7CG7CACCAGA7ACGAAA7CACC7CC7AC77CACC7C77AC77AACC77GGAC 2S *° 

" ' 5 " E G W Y ° N S ? R 3 F M v * ' ? C R T A V V Y A I v E 0 E v E N E L E P 

7CGCCGG77AAGA7A7A7C77AACAACAGG7~C7GAA GCAGGAA7GCCA77A77GA7C77CC~ATG77 

1 1 ' 1 ! — - f- i — - L — - , 2538 

AGCGSCCiATTCTATATAGAATTGTTCTCCiiGACTTCGTCCTTACGGTAATAAC-AGAAGGATiCAA 



V A 



% 



Ik..- 




•r60 rlO +B0 ^-90 flOO <-1 10 ^-1Pn 

125*94. seq TAGTTTTGGGTACCATGTCACAAACTTTTTTGCACCTAGCAGCCGAT7TGGAACTCCTGATGATTTGAAG 
JAGTTT i GGGTA CA GTCACAAACTTTT TGCA CTAGCAGCCGATTTGGAACTCCTGATGATTT AAG 
1 1o - Se * I^JJ' TGGG [ A ^ACGTCACAAACTTTTATGCAGCTAGCAGCCGATTTGGAA 

™ M150 ^1 160 ^1 170 -M180 *-1190 *-1200 

*-130 *140 ^150 ^160 ^170 ^180 Joo 

125-94. seq TCTTTAATAGATAAAGCTCATGAGTTAGGGCTGCTTGTTCTCATGGATATTGTTCATAGCCATGCGTrAA 
TAATAGATAAAGCTCA GAGTTAGG CT CTTGTTCTCATGGATATTGTTCATa1"a™c TCaI 

M? 20 1230 1240 *1250 ^1260 ^1270 

^200 ^210 ,r220 «-230 «-240 ^250 ^-260 

125*94. seq ATAATACGTTGGATGGGCTGAACATGTTTGATGGTACGGATAGTCACTACTTCCACTCCGGATCACGGGG 
TAATACGTTGGATGGGCTGAA ATGTTTGATGGTACGGAT GTCACTACTT CACTC GGA CACGGGG 

^1280 M290 *-1300 ^1310 *-1320 *-1330 *-1340 

♦"270 ^280 ^290 ^300 ^310 ,-320 JIo 

125*94. seq TCATC ATTGGTTGTGGGACTCTCGCCTT7TCAACTATGGAAGCTGGGAGGTGCTAAGATTTCTTCTTTCA 
TCATCATTGG TGTGGGACTC CGCCTTTTCAACTATGG AGCTGGGAGGT CTAAG TTTCTTCTTTCA 

6 - SSq I^ A J GATTGGA I G J^GACTCCCGCCTTTTCAACTATGGGAGCTGGGAGGTTCTAAGGTTTCTTCTTTCA 

*-1350 *-1360 ^1370 *-1380 *-1390 *-1400 "M410 

^ 35 ° ^60 ^370 ^380 ^390 ^400 

125*94. seq AATGCAAGATGGTGGTTGGAAGAGTACAGGTTTGATGGTTTTAGATTTGATGGGGTGACTTCCATGATGT 

AATGCAAG TGGTGGTTGGA GAGTACA GTTTGATGG TT AGATTTGA GGGGTGACTTC ATGATGT 
116. seq AAtGCAAGGTGGTGGTTGGATGAGTACAAGTTTGATGGGTTCAGATTTGACGGGGTGACTTCAATGATGT 

^-1420 *-1430 *-1440 -1450 ^1460 ^1470 ^1480 

<-410 ?H20 ^430 *-440 ^450 ^460 vr470 

125*94. seq ACACTCCCCATGGGTTGCAGGTAGCTTTTACTGGCAACTACAATGAGTACTTTGGATATGCAACTGATGT 

A "C C CATGG TTGCAGGTAG TTTTAC GGCAACTACAATGA TACT7TGGATATGCAACTGATGT 
1 16. seq ACACCCA7CA7GGA77GCAGG7AGA7777ACCGGCAAC7ACAA7GAA7AC777GGA7A7GC AAC7GA7G7 

M490 *-1500 ^1510 ^1520 ^1530 M540 *-1550 

^480 -490 ^500 ^510 T -520 -530 ^540 

125*94. seq AGA7GC i G7GA777A777GA7GC77G7GAA7GA7A7GA77CACGG7C7777CCC7GAGGC7G77ACCA77 

AGATGC7G7G 777A777GA7GC7 7GAA7GA7A7GA77CA GG7C7 77CCC GAGGCTG7 ACCA77 
1 16. seq AGA7GC7G7GG777A777GA7GC7G77GAA7GA7A7GA77CA7GG7C7C77CCCAGAGGC7G7CACCA77 

*-1560 *-1570 M580 *-1590 *-1600 ^1610 *M620 

<-550 *-560 *-570 ,-580 <-590 ^-500 <r610 

125*94. seq GG7GAAGA7G77AGCGGAAAGCCAACA7777GCA77CCAG7GGAAGA7GG7GG7G77GGA777GA77ACC 

GG7GAAGA7G77AG GGAA GCCAACA 77TGCA77CC G7 GAAGA7GG7GG7G77GG 777GA77A C 
1 1o. seq GG7GAAGA7G77AG7GGAA7GCCAACAG777GCA77CCGG77GAAGA7GG7GG7G77GGC777GA77A7C 

^1630 ^1640 ^1650 ^1660 ^1670 *1680 <-1690 

«o C o„ *" 620 * 630 ^ 6Z, ° ^ 50 *-660 T -670 ^680 

i 25*94. seq G7C7CCACA7GGCCA77GCCGA7AAA7GGA77GAGA77C77AAGAAGAGAGA7GAGGAC7GGAAAA7GGG 
G7C7CCACA7GGC 77GC GA7AAA7GG 77GAGA77 77 AGAAGAGAGA7GA GA 7GGAAAA7GGG 
1 1o. seq G7C7CCACA7GGC7G77GC7GA7AAA7GGG77GAGA77A77CAGAAGAGAGA7GAAGA77GGAAAA7GGG 

^1700 ^1710 «M720 M730 <M740 ^1750 ^1760 

^ 90 T " 700 ^ 71 ° *" 720 ^ 73 0 *"740 ^7 50 

123*94. seq 7GACA7 i G7GCA7ACAC7C ACC AACAGAAGG7GG77GGAAAAA7G7G77GC77A7GC7GAAAG7CA7GAC 

7GACA77G7 CA7A C7 ACCAACAG GG7GG77GGAAAA 7G7G77 C77A7GC7GAAAG7CA7GAC 
1 1o. seq JGACA77G7ACA7A7GC7GACCAACAGGCGG7GG77GGAAAAG7G7G777C77A7GC7GAAAG7CA7GAC 

^1770 *-1780 ^1790 ^1800 ^1810 *-1820 *-1830 

m- n „ " r76 ° * 770 r7Q0 *" 790 *" 800 *"810 ^820 

12b*94. seq CAAGC7C77G77GG7GAC AAAAC7A77GCA7777GGC7GA7GGACAAGGACA7G7ACGAC77CA7GGC7C 

CA GC C77G77GG7GACAAAAC7A77GCA7777GGC7GA7GGACAAGGA A7G7A GAC77CA7GGC7C 
1 1o. seq CAGGCCC77G77GG7GACAAAAC7A77GCA7777GGC7GA7GGACAAGGA7A7G7A7GAC77CA7GGC7C 

*-1840 *-l850 ^1860 *-1870 *-l880 ^1890 ^1900 

<"830 -840 ^850 ^860 ^870 ^880 ^890 

125*94. seq G7GACAGACCA7C7AC7CC7C77A7AGA7CG7GGAA7AGCA77GCACAAAA 7 GA7CAGGC77A77ACCA7 
7GACAGACCA7C7AC CC7C7 A7AGA7CG7GGA 7AGCA77GCACAAAA7GA7CAGGC77A77ACCA7 
1 1o. seq 77GACAGACCA7C7ACCCC7C7CA7AGA7CG7GGAG7AGCA77GCACAAAA7GA7CAGGC77A77ACCA7 

*1910 M920 ^1930 ^1940 *1950 *-1960 M970 

«„ * 900 *" 920 *" 930 WO ^950 ^960 

.25*94. seq GGGC77AGGCGGAGAAGGA7A77TGAA7777A7GGGAAA7GAA777GGACA7CC7GAG7GGA77GA7777 

GGG 77AGGCGGAGAAGGA7A777GAA7777A7GGGAAA7GAA777GGACA CC GAG7GGA77GA7777 
1 1o. seq GGGA77AGGCGGAGAAGGA7A777GAA7T77A7GGGAAA7GAA777GGACACCCCGAG7GGA77GA7777 

*-1980 *-1990 ^2000 *-2010 *-2020 *-2030 *-2040 



# 



. .6. , eq CCg« AGGTGt TCT g c A TCTTCCC gGTGG T t . 3 TTT STT C^ 

*-1 1 10 ^1 120 ,1130 ,1140 A °0 ,? rS i ™ 

^1 180 ,1190 ,1200 ,1210 <r1??0 r f^n To^o 

r1250 ,1260 ,1270 ,1280 ,7290 ,7300 ,l??S 

125*94. seq cagattaccgagttggctgcttcaagtcaggaaagtacaagattgttttggactcggatgItggcttgtt 

11R C GAT.ACCGAGTTGGCTGCTT AAG C AGGAAAGTACAAGAT GT TTGGA TC GATGAT TTGTT 

116. seq CGGATTACCGAGTTGGCTGCTTAAAGCCAG^ 

Mia ,? 3 3 3? Tsto :?n§ ;? 3 3 6 7 s :? 3 3 ?g 1SS 

n,seq I---T T GGCAGGCTTAGTCATGATGC« 

,1390 ,1400 ,1410 ,1420 ,1430 -loan 7u2n 

125*94. seq CGGTCCTTCATGGTATATGCACCA^^ 

CG "CCiTlATGGT TA CACCAl TAG ACAGCAGTGGTC ATGCTTTAGT GA GATGAAC 
1 lo. seq CGATCCTTCATGGTGTACA^ . 

2470 ^2480 ''-2490 *-2500 *-2510 ^2520 *-25'30 




125-91. pro SFG YHVTNFFAPSSRFGTPDOLKSL I DKAHELGLLVLMDfvHSHASNNTLOGLNMFDGTDSHYFH^r <?Pr^^ 

f^YHVTNF: A: SSRFGTPODLKSL I DKAHELGLL VLMD VhIhAS N^S^nSfS^S H^hIg RG 
116. pro SFGYHVTNFYAASSRFGTPDDLKSLIDKAHELGL^^ 

%o %o 100 H, ° H20 " L43 ° 

125-94. pro HHWLWDSRLFNYGSWEVLRFLLSNARWWLEEYRFDGFRFDGVTSMMYTP^LQVAFTGNYNi?FGYA T nv 1 ^ 

HHW: WDSRLFNYGSWEVLRFLLSNARWWL: EY: F0GFRFDGVTSMMY7 HGLQV FTrNYNrYFPYATnu 
116. pro HHWMWDSRLFNYGSWEVLRFLLSNARWWLDEYKFDGFRFDGVTS^ 

°150 5 160 170 M8 ° H9 ° * 500 

125-94. pro oaviylmlvndmihglfpeavtigedvsgi<ptfcipvedggvgfoyrlhma?adkwieilS^ 

DAV: YLML: NDM I HGLFPEAVT I GEDVSG. PT C I P VEDGG VGFO YRLHMA - ADKW - E I • • KRDEDWKM^ 
OAVVYLMLLNDM I HGLFPEAVT I GEO VSGMPTVC I PVEDGGVGFDYRLHMAVAOKWVE I I QKRDEDWKMG 
oiu ^520 *-o30 *540 *-550 *-560 *-^7n 

*220 ^230 ^240 ^250 «-260 %7n o«n 

n,\m T ™^F VAYAESHDQA ^ 

DIVH LTNRRWLEKCV: YAESHDQAL VGDKTI AFWLMDKDMYDFMA DRPSTPL I DRG ALHKM I Rl I TM 
D I VHMLTNRRWLEKCVS YAESHDQAL VGOKT I AFWLMOKDM YDFMALDRPSTPL I DRG VALHKM I RL tS 
oao ^590 ^600 "4510 *-620 **nn ±euin 

^290 ^300 ^310 ^320 " 3 30 ?340 ?350 

oi"r^rn3l" N ^ MGNE ^ GH ^ EW l^FPRGDRHLPNGKVI PGNNHS YDKCRRRFDLGDAD YLR YHGMQ r FDQAM 
GLGGEGYLNFMGNEFGHPEWIDFPRGD HLP: GK : PGNN. SYDKCRRRFDLG LR YHGMQEFDOA 

glggegylnfhgnefgi^ewidfprgdlhlpsgkf^ 

~ 65 ° n 0 ^ 70 * 68 ° ^90 *-700 ^710 

*\3bu *-370 *\380 *-390 *-400 j-410 aon 

QHLEEAYGFMTSEHOY I SRKDEGDR I I VFERGNLVFVFNFHWTNSYSD YRVGCFKSGK YK I VLDSDDGLF 
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C F.^. q 13 /it, 

.Bel I Nco I 

ATGGACAAGGATATGTATGACTTCATGGCTCTTGACAGACCATC TACTCCTCTCAT AGATCGTGGAGTAGCATTGCAC AAAATGATCAGGCTTATT ACCA 
TACCTGTTCCTATACATACTGAAGTACCGAGAACTGTCTGGTAGATGAGGAGAGTATCTAGCACCTCATCGTAACGTGTTTTACTAGTCCGAATAATGGT 
MDKDMYDFMALORPSTPL I DRGVALHKMIRL t T 

TGGGATTAGGCGGAGAAGGATATTTGAATTTTATGGGAAA TGAATTTGGACACCCCGAGTGGATTGATTTTCCAAGAGGTGATCTACATCTTCCCAGTGG 

• ! 1 f ! i 1 i ' f ' i 1 1 > i 1 ( 1 (- 200 

-ACCCTAATCCGCCTCTTCCTATAAACTTAAAATACCCTTTACTTAAACCTGTGGGGCTCACCTAACTAAAAGGTTCTCCACTAGATGTAGAAGGGTCACC 



{ y GLGGEGYLNFMGNEFGHPEW I OFPRGDLH 



L P S G 



JEcoR V Be! I 

TAAATTTGTTCCTGGGAACAATTACAGTTATGATAAATGCCGGCGTAGGTTTGATCTAGGCAATTCAAAGCGTCTGAGATATCATGGAATGCAAGAGTTT 
' 5 1 f 1 f ' f « i 1 1 i . i 1 i , h- 300 

ATTTAAACAAGGACCCTTGTTAATGTCAATACTATTTACGGCCGCATCCAAACTAGATCCGTTAAGTTTCGCAGACTCTATAGTACCTTACGTTCTCAAA 

K^VPGNNYSYDKCRRRFDLGNSKRLRYHGMQEF 

GATCAAGCAATTCAGCATCTTGAAGAAGCCTATGGTTTCATGACTTCTGAGCACCAATACATATCACGGAAGGATGAAAGGGATCGGATCATTGTCTTCG 
1 ! 1 1 ' ' 1 ' 1 1 1 1 1 i 1 ! 1 [ i (- aoo 

CTAGTTCGTTAAGTCGTAGAACTTCTTCGGATACCAA-AGTACTGAAGACTCGTGGTTATGTATAGTGCCTTCCTACTTTCCCTAGCCTAGTAACAGAAGC 
D Q A t QHL EEAYGFMTSEHQ Y [ SRKDEROR I J VF 

AGAGGGGAAACCTCGTTTTTGTATTCAATTTTCATTGGACTAGCAGCTATTCGGATTACCGAGTTGGCTGCTTAAAGCCAGGAAAGTACAAGATAGTCTT 
1 H ' i ■ ! 1 1 ■ i . ! . ! ■ : , i , f- 500 

TCTCCCCTTTGGAGCAAAAACATAAGTTAAAAGTAACCTGATCGTCGATAAGCCTAATGGCTCAACCGACGAATTTCGGTCCTTTCATGTTCTATCAGAA 

ERGNLVFVFNFHWTSSYSOYRVGCLKPGKYK I VL 

GGATTCAGATGATCCTTTGTTTGGAGGCT TTGGCAGGCTTAGTCATGATGCAGAGCACTTCAGCTTTGAAGGGTGGTACGArAACCGGCCTCGATCCTTC 

1 ? i i i ' f 1 i ' ' ' i 1 : I 1 i- 600 

CC T AAG TC T AC TAGGAA ACAA AC C TCCG A A ACCGTCCGAATCAGTACTACGTCTCGTGAAGTCGA A ACTTCCCACCATGCTATTGGCCGGAGCT AG G A AG 

OSODPLFGGFGRLSHOAEHFSFEGWYONRPRSF 

ATGGTGTACA CACCATGTAGAACAGCAGTGGTCTATGCTTTAGTGGAGGATGAAGTGGAGAATGAAGTGGAACCTGTCGCCGGTTAAGATATATCTTAGC 

i ' 1 1 i 1 ( 1 H i 1 i i . f h 700 

TACCACATGTGTGGTACATCTTGTCGTCACCAGATACGAAATCACCTCCTACTTCACCTCTTACTTCACCTTGGACAGCGGCCAATTCTATATAGAATCG 
MVYTPCRTAVVYALVEDEVENEVEPVAG. 



UCAGGTTCTGAAGCAG GAATGCCATTATTGATCTTCCTATGTGCATCTGCGTTGAACGAAATATATTGAGCCTATAATTTGATGTCACGGTCCTTGCAG 

- i 1 i 1 ! 1 i 1 1 1 ! , 1 - r g oo 

TTGTCCAAG ACTTCGTCCTTACGGTAATAACT AGAAGGATACACGTAGACGCAACTTGCTTT ATATAACTCGGATATTAAACTACAGTGCCAGGAACGTC 

ATTTCCATCCTG GTTCTTGGTATTTTGTTGTCATGATAAACATAATCAAAGACCAATAGGAAACGCAGGGTTACATGCTAGCTTCCATCATCATAGGGAG 

1 i 1 i ! 1 i 1 ! 1 i 1 1 . ! 1 1 , 1_ goo 

TAAAGGTAGGACCAAGAACCATAAAACAACAGTACTATTTGTATTAGTTTCTGGTTATCCTTTGCGTCCCAATGTACGATCGAAGGTAGTAGTATCCCTC 
Sac I 

CTCAGACC TCCTAAACCATAAATCTTCAAGCTGCCTGCGTTCGGTAGTATGTTATGTGGTACTTTGCAATCTTAAATTATCATGATCGCTGTGGATGCTA 
=- 1 1 1 1 ! 1 i 1 1 1 , , ( u !000 

GAGTCTGGAGGATTTGGTATTTAGAAGTTCGACGGACGCAAGCCATCATACAATACACCATGAAACGTTAGAATTTAATAGTACTAGCGACACCTACGAT 

AC TATGAC AATTTTGTATAT ATGCCAACGAGGATTTTAAGTTTTAAAA AAAAAAC AAAAAAAATCCATG 
i i 1 ■ i . 1 ■ i ■ i 1 ^ 1069 

TGATACTGTTAAAACATATATACGGTTGCTCCTAAAATTCAAAATTTTTTTTTTGTTTTTTTTAGGTAC 



% 



9 



Clal 



Kpn I 



TATGGATTGACATCGATAATACGACTCACTATAGGGATTTTTTTTTTTTTTTTTTTTTGTAGTTTTGGGTACCATGTCACAAACTTTTTTGCACCTAGCA 

I 1 ' 1 1 ' i ' 1 i . 1 . 1 ■ 1 , , , h 10Q 

ATACCTAACTGTAGCTATTATGCTGAGTGATATCCCTAAAAAAAAAAAAAAAAAAAAACATCAAAACCCATGGTACAGTGTTTGAAAAAACGTGGATCGT 

SFGYHVTNFFAPS 

GCCGATTTGGAACTCCTGATGATTTGAAGTCTTTAATAGATAAAGCTCATGAGTTAGGGCTGCTTGTTCTCATGGATATTGTTCATAGCCATGCGTCAAA 
1 i 1 i i . i . ! ■ 1 . 1 ■ 1 , , , h 200 

...-fGGCTAAACCTTGAGGACTACTAAACTTCAGAAATTATCTATTTCGAGTACTCAATCCCGACGAACAAGAGTACCTATAACAAGTATCGGTACGCAGTTT 
* FGTPDDLKSLIDKAHELGLLVLMOIVHSHASN 

. -V " 

TAATACGTTGGATGGGCTGAACATGTTTGATGGTACGGATAGTCACTACTTCCACTCCGGATCACGGGGTCATCATTGGTTGTGGGACTCTCGCCTTTTC 
. ! 1 1 1 i 1 1 1 1 1 i 1 1 : , ( , h 300 ■ 

ATTATGCAACCTACCCGACTTGTACAAACTACCATGCCTATCAGTGATGAAGGTGAGGCCTAGTGCCCCAGTAGTAACCAACACCCTGAGAGCGGAAAAG 

NTLDGLNMFDGTDSHYFHSGSRGHHWLWDSRLF 

AACTATGG A AGCTGGGAGGTGCTAAGATTTCTTCTTTC A AATGCAAG A TGGTGGTTGGAAGAGT AC AGGTTTGATGGTTTTAGATTTGATGGGGTGACTT 
• i • 1 ■ i 1 ! 1 i 1 1 . 1 , : i ■ 1 . h 400 

TTGATACCTTCGACCCTCCACGATTCTAAAGAAGAAAGTTTACGTTCTACCACCAACCTTCTCATGTCCAAACTACCAAAATCTAAACTACCCCACTGAA 
N Y G SWEVLRFLLSNA RWWL EE YRFDGFRF DG VT 
Nco I . Sea 1 

CCATGATGTACACTCCCCATGGGTTGCAGGTAGCTTTTACTGGCAACTACAATGAGTACTTTGGATATGCAACTGATGTAGATGCTGTGATTTATTTGAT 
. ! ■ i . i ■ i . ! . i , ( , ; , j , h 500 

GGTACTACATGTGAGGGGTACCCAACGTCCATCGAAAATGACCGTTGATGTTACTCATGAAACCTATACGTTGACTACATCTACGACACTAAATAAACTA 
SriMYTPHGLQVAFTGNYNEYFGYATOVOAVIYLM 
GCTTGTGAATGATATGATTCACGGTCTTTTCCCTGAGGCTGTTACCATTGGTGAAGATGTTAGCGGAAAGCCAACATTTTGCATTCCAGTGGAAGATGGT 

, i . j ! j 1 i 1 j ! i 1 1 , 1 , j , h goo 

CGAACACTTACTATACTAAGTGCCAGAAAAGGGACTCCGACAATGGTAACCACTTCTACAATCGCCTTTCGGTTGTAAAACGTAAGGTCACCTTCTACCA 

LVNDMIHGLFPEAVT IGEDVSGKPTFC [PVEDG 

GGTGTTGGATTTGATTACCGTCTCCACATGGCCATTGCCGATAAATGGATTGAGATTCTTAAGAAGAGAGATGAGGACTGGAAAATGGGTGACATTGTGC 
■ i . i ■ i 1 ! 1 i ! . 1 . i 1 1 , h 700 

CCACAACCTAAACTAATGGCAGAGGTGTACCGGTAACGGCTATTTACCTAACTCTAAGAATTCTTCTCTCTACTCCTGACCTTTTACCCACTGTAACACG 
GVGF OYRLHMA I ADKW I E I LKKRDEDWKMGD I V 
TACACTCACCAACAGAAGGTGGTTGGAAAAATGTGTTGCTTATGCTGAAAGTCATGACCAAGCTCTTGTTGGTGACAAAACTATTGCATTTTGGCTGAT 



800 



TATGTGAGTGGTTGTC7TCCACCAACCTTTTTACACAACGAATACGACTTTCAGTACTGGTTCGAGAACAACCACTGTTTTGATAACGTAAAACCGACTA 
KTLTNRRWLEKCVAYAESHOQALVGDKT (AFWLM 

Bel I Nco I 

GGACAAGGACATGTACGACTTCATGGCTCGTGACAGACCATCTACTCCTCTTATAGATCGTGGAATAGCATTGCACAAAATGATCAGGCTTATTACCATG 
f ■ 1 ■ i i i . 1 1 i ■ ! 1 ! , h 900 

CCTGTTCCTGTACATGCTGAAGTACCGAGCACTGTCTGGTAGATGAGGAGAATATCTAGCACCTTATCGTAACGTGTTTTACTAGTCCGAATAATGGTAC 

OKOliYDFMARDRPSTPL f ORG [ALHKK [ RL I T M 

GGCT7AGGCGGAGAAGGATATTTGAATTTTATGGGAAATGAATTTGGACATCCTGAGTGGATTGATTTTCCAAGAGGGGATCGACATCTGCCCAATGGTA 

i j ! , i , 1 i 1 i 1 i 1 1 ■ i 1000 

CCGAATCCGCCTCTTCCTATAAACTTAAAATACCCTTTACTTAAACCTGTAGGACTCACCTAACTAAAAGGTTCTCCCCTAGCTGTAGACGGGTTACCAT 

GLGGEGYLNFF1GNEFGHPEW 1 OFPRGDRHLPNG 
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o 



EcoR V Bel I 

aagtaattccagggaacaaccacagttatgataaatgccgtcgtagatttgatctaggtgatgcagactatctaagatatcatgga atgcaagagtttL 

TTCATTAAGGTCCCTTGTTGGTGTCAATACTATTTACGGCAGCATCTAAACTAGATCCACTACGTCTGATAGATTCTATAGTACCTTACGTTCTCAAACT 1 

KV f PGNNHSYDKCRRRFDLGDADYLRYHGMQEFD 

TCAGGCAATGCAACATCTTGAAGAAGCCTATGGTTTCATG ACTTCTGAGCACCAGTATATATCACGGAAGGATGAAGGAGATCGGATCATTGTCTTTGAG 

1 1 1 f 1 ' 1 1 ! ' 1 i 1 1 1 1 . -i 1 l. loon 

AGTCCGTT ACGTTGTAGAACTTCTTCGGATACCAAAGTACTGAAGACTCGTGGTCATATATAGTGCCTTCCTACTTCCTCTAGCCTAGTAAC AGAAACTC 



•AMQHLEEAYGFMTSEHQY ISRKDEGDR 



1 t V F E 



AGGGGAAACCTTGTTTTTGTATTCAACTTTCATTGGACTAACA GCTATTCAGATTACCGAGTTGGCTGCTTCAAGTCAGGAAAGTACAAGATTGTTTTGG 

1 1 ' 1 1 ! 1 1 ' 1 1 1 * i i s 1 } i l, i -^nn 

TCCCCTTTGGAACAAAAACATAAGTTGAAAGTAACCTGATTGTCGATAAGTCTAATGGCTCAACCGACGAAGTTCAGTCCTTTCATGTTCTAACAAAACC 



RGNLVFVFN 



HWTNSYSOYRVGCFKSGKYK I V 



ACTCGGATGATGGCTTGTTTGG AGGCTTC A AC AGGCTTAGTCATG A TGCCGAGCACTTCACCTTTGACGGGTGGTATG AT AACCGGCCTCGGTCCTTC AT 

1 f 1 i » = ' i ' i « 1 1 } 1 1 , i , l. 

TGAGCCTACTACCGAACAAACCTCCGAAGTTGTCCGAATCAGTACTACGGCTCGTGAAGTGGAAACTGCCCACCATACTATTGGCCGGAGCCAGGAAGTA 
OSDDGLFGGFNRl_. SHDAEHFTFDGWYDNRPRSFM 

GGTATATGCACCATCTAGGACAGCAGTGGTCCATGCTTTAGTAGAAGATGAAGAGAATGAAGCAGAGAATGAAGTAGAAAGTGAAGTGAAACCAGCCTCC 
* ' — ' 1 ' 1 '■ » i * ! i 1 i , 1 . , 1. 

CCATATACGTGGTAGATCCTGTCGTCACCAGGTACGAAATCATCTTCTACTTCTCTTACTTCGTCTCTTACTTCATCTTTCACTTCACTTTGGTCGGAGG 

VYAPSRTAVVHALVEDEENEAENEVESEVKPAS 

: BamH I Hinc II 

GGCTGAGATAGATATTTAGTAAGAGGATCCCCTAAAGCAGGAATGGTTAACCTGTGCATCTGCATTGAACGACGTATATTGAGACTTGAATTGATTTGCT 
1 * 1 ! 1 1 ' ! ' i ' 1 1 j . j , i , u 

CCGACTCTATCTATAAATCATTCTCCTAGGGGATTTCGTCCTTACCAATTGGACACGTAGACGTAACTTGCTGCATATAACTCTGAACTTAACTAAACGA 
G 



1400 



1500 



1500 



Ssp I 



Nsi I 
Bel t 



GC TC AGGACACAGAATATTA ATTCCAAGGCTCAAGGC AGAGATACACGCCATAATGCATGATCATATGAAAGCTCCCCAACTTGTAAATC ATTTAGCAAG 
■ — ' i . . ... > . i ; , . . ,. j. . . . j t | | . i . i | [ i i i • 

CGAGTCCTGTGTCTTATAATTAAGGTTCCGAGTTCCGTCTCTATGTGCGGTATTACGTACTAGTATACTTTCGAGGGGTTGAACATTTAGTAAATCGTTC 



1700 



: Sca I Nco I 

CTGCGTGCACTCTGTAAATTATATGTAGTACTTTGGCAAGTCACGTTATTATGGATACCATGGATGTCCGCTAGGAAAAATTTTGTGTATACGCCTACTA 

1 i 1 f i 1 ' 1 i 1 1 ' i ! 1 i- 1800 

GACGCACGTGAGACATTTAATAT ACATCATGAAACCGTTC AGTGCAATAATACCTATGGTACCTAC AGGCGATCCTTTTTAAAACACATATGCGGATGAT 



Xmn I 

GGATTTTTAAA TCTCGCATGTTCCACATAAAGTGGTGGTTGAATGTTGCGCGACTATTTTTGAGTAAAATGATTGAAGTTATTCTTCACTTGGGCCTGTG 

1 1 1 1 *" — i 1 i ' ! 1 f ' ! " r 19C0 

CC T AAA AATTTAGAGCGTAC AAGGTGTATTTCACC ACCAACTTACA ACGCGCTGAT AAA AACTC AT TTTACTAACTTCAATAAGAAGTGAACCCGGAC AC 



AAAAAAAAAAAAAAAAAAA 

i . i- 1919 

TTTTTTTTTTTTTTTTTTT 
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